Mining Generalised Emerging Patterns
نویسندگان
چکیده
Emerging Patterns (EPs) are a data mining model that is useful as a means of discovering distinctions inherently present amongst a collection of datasets. However, current EP mining algorithms do not handle attributes whose values are asscociated with taxonomies (is-a hierarchies). Current EP mining techniques are restricted to using only the leaf-level attribute-values in a taxonomy. In this paper, we formally introduce the problem of mining generalised emerging patterns. Given a large data set, where some attributes are hierarchical, we find emerging patterns that consist of items at any level of the taxonomies. Generalised EPs are more concise and interpretable when used to describe some distinctive characteristics of a class of data. They are also considered to be more expressive because they include items at higher levels of the hierarchies, which have larger supports than items at the leaf level. We formulate the problem of mining generalised EPs, and present an algorithm for this task. We demonstrate that the discovered generalised patterns, which contain items at higher levels in the hierarchies, have greater support than traditional leaf-level EPs according to our experimental results based on ten benchmark datasets.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملGeneralised interaction mining: probabilistic, statistical and vectorised methods in high dimensional or uncertain databases
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, useful and ultimately understandable patterns in data. The core step of the KDD process is the application of Data Mining (DM) algorithms to e ciently nd interesting patterns in large databases. This thesis concerns itself with three inter-related themes: Generalised interaction and rule mining; the i...
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملUnderstanding Temporal Human Mobility Patterns in a City by Mobile Cellular Data Mining, Case Study: Tehran City
Recent studies have shown that urban complex behaviors like human mobility should be examined by newer and smarter methods. The ubiquitous use of mobile phones and other smart communication devices helps us use a bigger amount of data that can be browsed by the hours of the day, the days of the week, geographic area, meteorological conditions, and so on. In this article, mobile cellular data mi...
متن کاملEfficient mining of interesting emerging patterns and their effective use in classification
Knowledge Discovery in Databases (KDD), or Data Mining is used to discover interesting or useful patterns and relationships in data, with an emphasis on large volume of observational databases. Among many other types of information (knowledge) that can be discovered in data, patterns that are expressed in terms of features are popular because they can be understood and used directly by people. ...
متن کامل